Importing necessary libraries for the assignment
import networkx as nx
import numpy as np
import pandas as pd
import pydot as pt
import pygraphviz as pgv
import sys
import matplotlib.pyplot as plt
I have used pandas library to load the CSV file as pandas DataFrame object.
df = pd.read_csv("HW1_asset_prices.csv")
I have calculated correlation matrix from given asset prices, with corr() function of pandas library, and assign it to a variable called correlation.
correlation = df.corr()
correlation.head()
Next, I have transformed our correlation matrix, from DataFrame object into numpy matrix array.
correlation_matrix = np.asmatrix(correlation)
We can see that correlation_matrix variable is a matrix with calculated asset correlation coefficients.
correlation_matrix
In this task I have created a graph with chosen layout settings, which in my opinion was the best fir for this kind of dataset. Under the graph, I have placed explanation of the choice. On the graph assets are represented as nodes, and connection between assets are shown with lines.
First, I have created a networkx graph object from our correlation matrix (numpy matrix array object).
G = nx.from_numpy_matrix(correlation_matrix)
Next, I have assigned the names for different assets, which will be later useful for labeling the nodes in our graph. I have deleted prefix for all assets EOD~, as I have found it irrelevant.
assets_names = correlation.index.values
assets_names = [x[-6:] for x in assets_names]
I have relabeled node names, from numbers (as networkx graph object stores it) to earlier prepared assets names.
G = nx.relabel_nodes(G, lambda x: assets_names[x])
I have created a function to display graph using personalized settings, in order for the graph to be visually readable.
def display(graph):
plt.subplots(figsize=(20, 20))
# Setting layout
layout = nx.circular_layout(graph)
#Nodes drawing
nx.draw_networkx_nodes(graph, pos=layout, node_color='red', node_size=700, alpha=0.9)
#Edges drawing
nx.draw_networkx_edges(graph, pos=layout, edge_color='grey', alpha=0.5)
#Label styling
nx.draw_networkx_labels(graph, pos=layout, font_size=12, font_weight='bold')
plt.show()
Thanks to the created function, called display, I can reproduce graph structure with given settings and given data.
display(G)
WHY THIS LAYOUT?
I have chosen circular layout. As every asset is connected to every asset and we do not represent the weight of edges at this point, the most convenient way is to set the graph in a circular shape. with even distances between each node, so we could have a regular mesh, improving the visibility (in comparison to other reviewed layouts). Nodes are distributed evenly, over the circle shape (with equal distances between each other) what makes every node visible, they do not overlap. Moreover, nodes connected to each other present regular symmetric mesh. We could spot easily every node. Paths between nodes (edges) are much easier to track in circular shape than in other settings, they do not overlap. It is still a challenge to follow and distinguish edges due to exact same width of every edge, but when we compliment our graph with additional information - width of edges on the basis of their weights, our graph will become much clearer.
I have done this task in two steps. First step was to create a graph, based on previous display function, but adding information about weight of edges from the correlation matrix. Next step would be adjusting size of the nodes to a particular criterion.
def display_weights(graph):
plt.subplots(figsize=(20, 20))
# Setting layout
layout = nx.circular_layout(graph)
# Creating a list of weights
edges_weights = graph.edges(data=True)
weights=[]
for a, b, c in edges_weights:
weights.append(c['weight'])
# Setting weights width size
weights = ([x**12 for x in weights])
#Nodes drawing
nx.draw_networkx_nodes(graph, pos=layout, node_color='red', node_size=800, alpha=0.8)
#Edges drawing
nx.draw_networkx_edges(graph, pos=layout, width=weights, edge_color='grey', alpha=0.7)
#Label styling
nx.draw_networkx_labels(graph, pos=layout, font_size=12, font_weight='bold')
plt.show()
I had to adjust the weights of the edges, because the values from the correlation matrix were not sufficient to visualize properly the thickness (too small values). Therefore, I had to rescale the weights raising them to the power of 12, which in result made small values smaller, so some edges are barely visible. Nevertheless, it helped me to make a differentiation between edges, so we have clearer picture.
display_weights(G)
As we clearly understand nodes in our undirected graph have all similar degree (they are connected all together between each other - degree 39). Therefore, differentiating nodes on the basis would not give us any result. We have to found another criterion, which will allow us to distinct between the node sizes.
There are several criteria we could use: threshold, positive, negative.
Threshold
Here, we could use arbitrary chosen threshold for the correlation matrix values. Let us assume we are only interested in strong correlations, and by strong correlation in finance, we commonly understand correlation coefficients greater than 0.8 (in absolute number, for both positive and negative values). Therefore, we will create a function, where we will add all edges above the chosen value of the threshold to the newly created graph, called T. Later on the basis of edges, which remained, we will calculate degree for each node and adjust its size accordingly to its degree.
First let us create graph T with edges corresponding to correlation coefficient above 0.7.
# New empty graph T
T = nx.Graph()
# Adding edges above 0.7 threshold to the graph T
edges_weights = G.edges(data=True)
for a, b, c in edges_weights:
if abs(c['weight'])>=0.7:
# We are adding edges above our threshold, from graph G to graph T
T.add_edge(a, b)
# We are adding weights, corresponding to edges (a,b), above our threshold, from graph G to graph T
T[a][b]['weight']=abs(c['weight'])
Let us look at graph T.
display_weights(T)
We can see that our graph T contains only edges with weights above set threshold, in our case 0.7. We can spot as well that in the graph, nodes have different degrees than in graph G (where all nodes have the same degree). Let us assign variable node_degrees to the dictionary, with assets as keys and degrees as values.
node_degrees = dict(T.degree())
node_degrees
Having our list of node sizes, we have to create a list, which will be readable by parameter node_size in networkx library function:
draw_networkx_nodes(... node_size= ...). We will rescale the size of nodes in order to obtain a visually informative graph.
node_sizes = [node**2.7 for node in node_degrees.values()]
Now we are ready to place the node_sizes list modifying our function display_weights().
def display_node_sizes(graph):
plt.subplots(figsize=(20, 20))
# Setting layout
layout = nx.circular_layout(graph)
# Creating a list of weights
edges_weights = graph.edges(data=True)
weights=[]
for a, b, c in edges_weights:
weights.append(c['weight'])
# Setting weights width size
weights = ([x**12 for x in weights])
#Nodes drawing
nx.draw_networkx_nodes(graph, pos=layout, node_color='red', node_size=node_sizes, alpha=0.8)
#Edges drawing
nx.draw_networkx_edges(graph, pos=layout, width=weights, edge_color='grey', alpha=1)
#Label styling
nx.draw_networkx_labels(graph, pos=layout, font_size=12, font_weight='bold')
plt.show()
display_node_sizes(T)
We have obtained visually informative graph. Nodes sizes depends on their degrees, so we can clearly see that the bigger the node the more edges it has, whereas small-sized nodes are barely visible.
General function
I have prepared general function for creating such graphs, including different parameters useful for adjusting the graph.
def display_all(graph, threshold, edge_scale, node_scale):
plt.subplots(figsize=(20, 20))
# New empty graph S
S = nx.Graph()
edges_weights = graph.edges(data=True)
for a, b, c in edges_weights:
if abs(c['weight'])>=threshold:
S.add_edge(a, b)
S[a][b]['weight']=abs(c['weight'])
# Setting node size
node_sizes = [node**node_scale for node in dict(S.degree()).values()]
# Creating a list of weights
edges_weights_S = S.edges(data=True)
weights=[]
for c,d,e in edges_weights_S:
weights.append(e['weight'])
# Setting weights width size
weights = ([x**edge_scale for x in weights])
# Setting layout
layout = nx.circular_layout(S)
#Nodes drawing
nx.draw_networkx_nodes(S, pos=layout, node_color='red', node_size=node_sizes, alpha=0.8)
#Edges drawing
nx.draw_networkx_edges(S, pos=layout, width=weights, edge_color='grey')
#Label styling
nx.draw_networkx_labels(S, pos=layout, font_size=12, font_weight='bold')
plt.show()
display_all(G,0.7,12,2.7)
As we can see the result graph is the same as the graph T created manually before, so the function works properly. Now we can use our function attributes to modify the settings as the threshold, node/edge scaling to obtain optimal visual results.
def display_using_graphviz(graph):
plt.subplots(figsize=(20, 20))
# Setting layout with added GRAPHVIZ function
layout = nx.nx_agraph.graphviz_layout(G, prog='twopi')
#Nodes drawing
nx.draw_networkx_nodes(graph, pos=layout, node_color='red', node_size=800, alpha=0.9)
#Edges drawing
nx.draw_networkx_edges(graph, pos=layout, edge_color='blue', alpha=0.5)
#Label styling
nx.draw_networkx_labels(graph, pos=layout, font_size=12, font_color='black', font_family='sans-serif', font_weight='bold')
#Displaying the graph figure
plt.show()
display_using_graphviz(T)
def display_using_pydot(graph):
plt.subplots(figsize=(20, 20))
# Setting layout with added PYDOT function
layout = nx.nx_pydot.pydot_layout(graph, prog='dot')
#Nodes drawing
nx.draw_networkx_nodes(graph, pos=layout, node_color='red', node_size=800, alpha=0.9)
#Edges drawing
nx.draw_networkx_edges(graph, pos=layout, edge_color='blue', alpha=0.5)
#Label styling
nx.draw_networkx_labels(graph, pos=layout, font_size=12, font_color='black', font_family='sans-serif', font_weight='bold')
#Displaying the graph figure
plt.show()
display_using_pydot(T)
USING SEABORN LIBRARY TO CREATE A HEATMAP
Additionaly, I have used Seaborn library to visualize correlation matrix as a heatmap. Positive correlation coefficients are presented with warmer colors (from red up to white spectrum) and negative correlation coefficient is presented with dark spectrum (from red to black colors). With the given picture we can find the patterns by color, which for human eye is more distinguishable than the networks itself. Although I think both methods compliment each other to spot the patterns, so to better understand the situation in the dataset.
import seaborn as sns
%matplotlib inline
plt.figure(figsize=(15,15))
sns.heatmap(correlation,
xticklabels=correlation.columns,
yticklabels=correlation.columns)